Zeitgeist: A Computational Model of Neologism Processing
نویسنده
چکیده
Language is a dynamic landscape in which words are not fixed landmarks, but fickle signposts that switch their directions as archaic senses are lost and new, more topical senses, are gained. Frequently, entirely new lexical signposts are added as newly minted word-forms enter the language. Some of these new forms are cut from whole cloth and have their origins in creative writing, movies or games. But many are patchwork creations whose origins can be traced to a blend of existing word forms (e.g., Dent, 2005). This latter form of neologism is of particular interest to the computational lexicographer, since such words possess an obviously compositional structure from which one can begin to infer meaning. In this paper, we demonstrate that, if given enough structural context, an automated system can assign a sufficiently rich semantic structure to these words to allow their broad meanings to be automatically incorporated into an electronic dictionary like WordNet (Miller, 1995). When tied to a system for harvesting new word forms from topical internet resources like Wikipedia, this capability allows for a dynamic computational lexicon that grows itself in response to a changing language and cultural context. We shall present a computational model of new-word formation, called Zeitgeist, that employs a collection of word-formation schemata to harvest previously unseen neologisms from Wikipedia. We further describe how these schemata exploit the semantic context provided by Wikipedia’s topology of cross-references to automatically assign meanings to novel portmanteau words. Because this topological context often fails to deterministically capture the precise meaning of a new word, Zeitgeist also employs a computational instantiation of the CL notions of hedging (e.g., Lakoff, 1987) and blending (e.g., Veale and O’Donoghue, 2000). We thus argue that a CL perspective to word formation is necessary even in the context of a practical application such as Zeitgeist. Indeed, we report empirical results that suggest Zeitgeist’s CL approach to word-formation yields a surprisingly robust model of neologism recognition and interpretation.
منابع مشابه
Tracking the Lexical Zeitgeist with WordNet and Wikipedia
Most new words, or neologisms, bubble beneath the surface of widespread usage for some time, perhaps even years, before gaining acceptance in conventional print dictionaries [1]. A shorter, yet still significant, delay is also evident in the life-cycle of NLP-oriented lexical resources like WordNet [2]. A more topical lexical resource is Wikipedia [3], an open-source community-maintained encycl...
متن کاملParleda: a Library for Parallel Processing in Computational Geometry Applications
ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...
متن کاملA Computational Approach to the Automation of Creative Naming
In this paper, we propose a computational approach to generate neologisms consisting of homophonic puns and metaphors based on the category of the service to be named and the properties to be underlined. We describe all the linguistic resources and natural language processing techniques that we have exploited for this task. Then, we analyze the performance of the system that we have developed. ...
متن کاملNehovah: A Neologism Creator Nomen Ipsum
In this paper, we describe a system called Nehovah that generates neologisms from a set of base words provided by a user. Nehovah focuses on creating “good” neologisms by evaluating various attributes of a neologism such as how well it communicates the source concepts and how “catchy” it is. Because Nehovah depends on the user to weight the importance of various attributes of the neologism and ...
متن کاملComputational Linguistics : What About the Linguistics ? ∗ Karen
Three times a year I get my copy of a wholly respectable mainstream linguistics journal. Its scholarly articles are rich in examples from varied languages, and alongside these detailed analyses it advances theoretical claims and counterclaims. Its many reviews point to much more of the same. But this journal content is of interest here for another reason than these scholarly ones. First, refere...
متن کامل